The survival of humanity has high dependency on the existence of natural resources.There has however resulted in the need to improve the management of these resources as well as strengthen the knowledge base on the interactions between humans and the earth through resources available to them.The understanding of these interactions and variations of vegetation (NDVI) and land cover therefore becomes very essential source of information because it tend to highlight more spatial activities in the environment. Key data analysis techniques will be employed on the cleaned dataset which have the same coordination systems as the land cover/land use (MCD12Q1) data.
This project will utilize the libraries below to conduct spatial data analysis and visualization.
setwd("C:/Users/charl/OneDrive/Desktop/DSCI 607 Data Analytics for Environmental Sciences")
library(sf)
## Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE
library(sp)
library(raster)
library(viridis)
## Loading required package: viridisLite
library(terra)
## terra 1.7.65
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::extract() masks terra::extract(), raster::extract()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::select() masks raster::select()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
The data set used for this project has same coordination systems as the landcover(MCD12Q1) data. The MODIS Land Cover Type Product (MCD12Q1) supplies global maps of land cover at annual time steps and 500-m spatial resolution for 2001-present. The product contains 13 Science Data Sets (SDS;Table 1), including 5 legacy classification schemes (IGBP, UMD, LAI, BGC, and PFT; Tables 3- 7) and a new three layer legend based on the Land Cover Classification System (LCCS) from the Food and Agriculture Organization (Tables 8- 10; Di Gregorio, 2005; Sulla-Menashe et al., 2011). Also included are a Quality Assurance (QA) layer, the posterior probabilities for the three LCCS layers, and the binary land water mask used by the product. MCD12Q1 has been Stage 2 Validated based on cross-validation of the training dataset used to create the maps. Two data set are used in this project namely; the landcover and NVDI datasets.
The data set required for this project work will be loaded into R
#######Loading Dataset1 ###############
Landcover_NDVI_Rainfall_filter <- readRDS("C:/Users/charl/OneDrive/Desktop/DSCI 607 Data Analytics for Environmental Sciences/Landcover_NDVI_Rainfall_filter.rds")
head(Landcover_NDVI_Rainfall_filter)
## # A tibble: 6 × 6
## # Groups: Month, x, y [6]
## Month x y Landcover MeanNDVI Rainfall
## <fct> <dbl> <dbl> <fct> <dbl> <dbl>
## 1 1 -7699562. 4206648. Savannas 0.121 99.9
## 2 1 -7699562. 4207111. Savannas 0.0598 99.9
## 3 1 -7699562. 4207574. Savannas 0.0140 99.9
## 4 1 -7699099. 4206184. Savannas 0.109 99.9
## 5 1 -7699099. 4206648. Savannas 0.211 99.9
## 6 1 -7699099. 4207111. Savannas 0.153 99.9
class(Landcover_NDVI_Rainfall_filter)
## [1] "grouped_df" "tbl_df" "tbl" "data.frame"
names(Landcover_NDVI_Rainfall_filter)
## [1] "Month" "x" "y" "Landcover" "MeanNDVI" "Rainfall"
dim(Landcover_NDVI_Rainfall_filter)
## [1] 4857113 6
#######Loading Dataset2 ###############
NDVI_timeseries_Month <- readRDS("C:/Users/charl/OneDrive/Desktop/DSCI 607 Data Analytics for Environmental Sciences/NDVI_timeseries_Month.rds")
head(NDVI_timeseries_Month)# first 6 observations
## Year JulianDay NDVI Month YearMonth
## 1 2013 1 0.1092190 1 2013-01-01
## 2 2013 17 0.3531177 1 2013-01-01
## 3 2013 33 0.3280171 2 2013-02-01
## 4 2013 49 0.2745036 2 2013-02-01
## 5 2013 65 0.2579496 3 2013-03-01
## 6 2013 81 0.3314169 3 2013-03-01
class(NDVI_timeseries_Month)
## [1] "data.frame"
names(NDVI_timeseries_Month)# names of columns used in the dataset
## [1] "Year" "JulianDay" "NDVI" "Month" "YearMonth"
#str(NDVI_timeseries_Month) # structure of dataset
dim(NDVI_timeseries_Month) # dimension of the data set
## [1] 230 5
unique(NDVI_timeseries_Month$Year) # checking the different years within the dataset
## [1] 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Below is the descriptive analysis of the data set used for this project work. This study employs the R statistical software to conduct descriptive analysis on both data set coupled with checking the presence of any missing cases found within the projected dataset. The Landcover dataset has initial 4857113 observations and 6 variables. The NDVI dataset has 230 initial observations coupled with 5 variables.
Below is a computation of the basic statistics(minimum, maximum, mean, median) of all numeric variables found in this dataset.From the output below it can be suggested that the minimum x and y coordinate is -7699562 and 4201551 respectively.
summary(Landcover_NDVI_Rainfall_filter)
## Month x y
## 7 : 406160 Min. :-7699562 Min. :4201551
## 9 : 406141 1st Qu.:-7454470 1st Qu.:4344252
## 8 : 406135 Median :-7341885 Median :4441547
## 6 : 405966 Mean :-7350710 Mean :4440260
## 10 : 405949 3rd Qu.:-7243663 3rd Qu.:4539770
## 5 : 405405 Max. :-7037489 Max. :4640309
## (Other):2421357
## Landcover MeanNDVI Rainfall
## Savannas :3360759 Min. :0.0000 Min. : 4.425
## Deciduous needleleaf forests: 486789 1st Qu.:0.3305 1st Qu.: 75.326
## Mixed forests : 332414 Median :0.4694 Median :100.695
## Permanent wetlands : 242807 Mean :0.5096 Mean :105.544
## Closed shrublands : 180853 3rd Qu.:0.7054 3rd Qu.:134.437
## Grasslands : 161332 Max. :0.9761 Max. :245.911
## (Other) : 92159
#A summary ofthe landcover dataset by landcover type
by(Landcover_NDVI_Rainfall_filter, Landcover_NDVI_Rainfall_filter$Landcover, summary)
## Landcover_NDVI_Rainfall_filter$Landcover: Evergreen needleleaf forests
## Month x y
## 4 : 24 Min. :-7571688 Min. :4269195
## 5 : 24 1st Qu.:-7555009 1st Qu.:4285874
## 6 : 24 Median :-7517944 Median :4323866
## 7 : 24 Mean :-7487870 Mean :4340228
## 8 : 24 3rd Qu.:-7467906 3rd Qu.:4340545
## 9 : 24 Max. :-7210304 Max. :4629189
## (Other):135
## Landcover MeanNDVI Rainfall
## Evergreen needleleaf forests:279 Min. :0.0001 Min. : 8.69
## Evergreen broadleaf forests : 0 1st Qu.:0.2810 1st Qu.: 87.41
## Deciduous needleleaf forests: 0 Median :0.5393 Median :122.57
## Deciduous broadleaf forests : 0 Mean :0.4964 Mean :119.55
## Mixed forests : 0 3rd Qu.:0.7139 3rd Qu.:146.32
## Closed shrublands : 0 Max. :0.8729 Max. :231.18
## (Other) : 0
## ------------------------------------------------------------
## Landcover_NDVI_Rainfall_filter$Landcover: Evergreen broadleaf forests
## Month x y
## 1 : 52 Min. :-7571688 Min. :4266415
## 2 : 52 1st Qu.:-7543426 1st Qu.:4272438
## 4 : 52 Median :-7521650 Median :4336839
## 5 : 52 Mean :-7452320 Mean :4357886
## 6 : 52 3rd Qu.:-7410919 3rd Qu.:4370313
## 7 : 52 Max. :-7066677 Max. :4628726
## (Other):304
## Landcover MeanNDVI Rainfall
## Evergreen broadleaf forests :616 Min. :0.00045 Min. : 6.045
## Evergreen needleleaf forests: 0 1st Qu.:0.33806 1st Qu.: 87.581
## Deciduous needleleaf forests: 0 Median :0.57660 Median :114.028
## Deciduous broadleaf forests : 0 Mean :0.53075 Mean :116.286
## Mixed forests : 0 3rd Qu.:0.74777 3rd Qu.:146.196
## Closed shrublands : 0 Max. :0.88250 Max. :233.292
## (Other) : 0
## ------------------------------------------------------------
## Landcover_NDVI_Rainfall_filter$Landcover: Deciduous needleleaf forests
## Month x y
## 4 : 40569 Min. :-7692613 Min. :4209428
## 5 : 40569 1st Qu.:-7528600 1st Qu.:4280778
## 6 : 40569 Median :-7469296 Median :4331742
## 7 : 40569 Mean :-7467562 Mean :4333825
## 8 : 40569 3rd Qu.:-7428061 3rd Qu.:4365564
## 9 : 40569 Max. :-7047681 Max. :4640309
## (Other):243375
## Landcover MeanNDVI Rainfall
## Deciduous needleleaf forests:486789 Min. :0.00185 Min. : 5.417
## Evergreen needleleaf forests: 0 1st Qu.:0.46130 1st Qu.: 86.549
## Evergreen broadleaf forests : 0 Median :0.70995 Median :120.657
## Deciduous broadleaf forests : 0 Mean :0.66534 Mean :117.640
## Mixed forests : 0 3rd Qu.:0.85970 3rd Qu.:147.056
## Closed shrublands : 0 Max. :0.95215 Max. :245.911
## (Other) : 0
## ------------------------------------------------------------
## Landcover_NDVI_Rainfall_filter$Landcover: Deciduous broadleaf forests
## Month x y
## 4 :148 Min. :-7645355 Min. :4234910
## 5 :148 1st Qu.:-7618483 1st Qu.:4250663
## 6 :148 Median :-7608290 Median :4260855
## 7 :148 Mean :-7544252 Mean :4308428
## 8 :148 3rd Qu.:-7530917 3rd Qu.:4344252
## 9 :148 Max. :-7056484 Max. :4640309
## (Other):876
## Landcover MeanNDVI Rainfall
## Deciduous broadleaf forests :1764 Min. :0.00215 Min. : 6.045
## Evergreen needleleaf forests: 0 1st Qu.:0.57970 1st Qu.: 95.409
## Evergreen broadleaf forests : 0 Median :0.69750 Median :124.352
## Deciduous needleleaf forests: 0 Mean :0.67031 Mean :119.769
## Mixed forests : 0 3rd Qu.:0.81060 3rd Qu.:147.002
## Closed shrublands : 0 Max. :0.92635 Max. :231.176
## (Other) : 0
## ------------------------------------------------------------
## Landcover_NDVI_Rainfall_filter$Landcover: Mixed forests
## Month x y
## 5 : 27707 Min. :-7692613 Min. :4209428
## 6 : 27707 1st Qu.:-7523967 1st Qu.:4278925
## 7 : 27707 Median :-7472076 Median :4325256
## 8 : 27707 Mean :-7458441 Mean :4328370
## 10 : 27707 3rd Qu.:-7393313 3rd Qu.:4359541
## 11 : 27707 Max. :-7042122 Max. :4640309
## (Other):166172
## Landcover MeanNDVI Rainfall
## Mixed forests :332414 Min. :0.0007 Min. : 5.417
## Evergreen needleleaf forests: 0 1st Qu.:0.4543 1st Qu.: 87.537
## Evergreen broadleaf forests : 0 Median :0.6578 Median :120.829
## Deciduous needleleaf forests: 0 Mean :0.6269 Mean :118.096
## Deciduous broadleaf forests : 0 3rd Qu.:0.7945 3rd Qu.:147.872
## Closed shrublands : 0 Max. :0.9278 Max. :245.911
## (Other) : 0
## ------------------------------------------------------------
## Landcover_NDVI_Rainfall_filter$Landcover: Closed shrublands
## Month x y
## 4 :15073 Min. :-7692613 Min. :4215451
## 5 :15073 1st Qu.:-7505898 1st Qu.:4276145
## 6 :15073 Median :-7457250 Median :4335912
## 7 :15073 Mean :-7408129 Mean :4379790
## 8 :15073 3rd Qu.:-7307137 3rd Qu.:4421625
## 9 :15073 Max. :-7037489 Max. :4640309
## (Other):90415
## Landcover MeanNDVI Rainfall
## Closed shrublands :180853 Min. :0.0004 Min. : 5.417
## Evergreen needleleaf forests: 0 1st Qu.:0.4439 1st Qu.: 82.779
## Evergreen broadleaf forests : 0 Median :0.6216 Median :114.723
## Deciduous needleleaf forests: 0 Mean :0.5977 Mean :114.723
## Deciduous broadleaf forests : 0 3rd Qu.:0.7622 3rd Qu.:147.234
## Mixed forests : 0 Max. :0.9256 Max. :245.911
## (Other) : 0
## ------------------------------------------------------------
## Landcover_NDVI_Rainfall_filter$Landcover: Open shrublands
## Month x y
## 7 : 5626 Min. :-7697246 Min. :4201551
## 8 : 5626 1st Qu.:-7538793 1st Qu.:4288191
## 9 : 5626 Median :-7402579 Median :4375294
## 11 : 5626 Mean :-7413754 Mean :4394426
## 6 : 5625 3rd Qu.:-7325206 3rd Qu.:4447107
## 10 : 5624 Max. :-7038878 Max. :4640309
## (Other):33579
## Landcover MeanNDVI Rainfall
## Open shrublands :67332 Min. :0.00005 Min. : 4.425
## Evergreen needleleaf forests: 0 1st Qu.:0.37605 1st Qu.: 80.820
## Evergreen broadleaf forests : 0 Median :0.51767 Median :110.232
## Deciduous needleleaf forests: 0 Mean :0.51417 Mean :110.697
## Deciduous broadleaf forests : 0 3rd Qu.:0.66760 3rd Qu.:137.161
## Mixed forests : 0 Max. :0.91800 Max. :245.911
## (Other) : 0
## ------------------------------------------------------------
## Landcover_NDVI_Rainfall_filter$Landcover: Woody savannas
## Month x y
## 7 :1031 Min. :-7694929 Min. :4206648
## 8 :1031 1st Qu.:-7446594 1st Qu.:4345642
## 6 :1030 Median :-7265438 Median :4524944
## 11 :1025 Mean :-7299477 Mean :4483072
## 9 :1022 3rd Qu.:-7160730 3rd Qu.:4603707
## 5 :1021 Max. :-7039342 Max. :4640309
## (Other):5509
## Landcover MeanNDVI Rainfall
## Woody savannas :11669 Min. :0.0001 Min. : 4.54
## Evergreen needleleaf forests: 0 1st Qu.:0.2163 1st Qu.: 74.36
## Evergreen broadleaf forests : 0 Median :0.4122 Median :101.11
## Deciduous needleleaf forests: 0 Mean :0.4116 Mean :104.06
## Deciduous broadleaf forests : 0 3rd Qu.:0.6050 3rd Qu.:130.57
## Mixed forests : 0 Max. :0.8750 Max. :245.65
## (Other) : 0
## ------------------------------------------------------------
## Landcover_NDVI_Rainfall_filter$Landcover: Savannas
## Month x y
## 5 : 280522 Min. :-7699562 Min. :4201551
## 7 : 280522 1st Qu.:-7390533 1st Qu.:4403092
## 8 : 280522 Median :-7302040 Median :4483709
## 9 : 280522 Mean :-7318253 Mean :4471498
## 10 : 280522 3rd Qu.:-7221887 3rd Qu.:4553669
## 11 : 280522 Max. :-7038878 Max. :4640309
## (Other):1677627
## Landcover MeanNDVI Rainfall
## Savannas :3360759 Min. :0.0000 Min. : 4.425
## Evergreen needleleaf forests: 0 1st Qu.:0.2948 1st Qu.: 73.276
## Evergreen broadleaf forests : 0 Median :0.4183 Median : 96.375
## Deciduous needleleaf forests: 0 Mean :0.4694 Mean :101.708
## Deciduous broadleaf forests : 0 3rd Qu.:0.6449 3rd Qu.:130.371
## Mixed forests : 0 Max. :0.9517 Max. :245.911
## (Other) : 0
## ------------------------------------------------------------
## Landcover_NDVI_Rainfall_filter$Landcover: Grasslands
## Month x y
## 6 :13457 Min. :-7694003 Min. :4207111
## 7 :13457 1st Qu.:-7367830 1st Qu.:4419308
## 8 :13457 Median :-7332619 Median :4446180
## 10 :13456 Mean :-7305090 Mean :4482613
## 5 :13455 3rd Qu.:-7226520 3rd Qu.:4576835
## 9 :13455 Max. :-7047681 Max. :4640309
## (Other):80595
## Landcover MeanNDVI Rainfall
## Grasslands :161332 Min. :0.0000 Min. : 4.54
## Evergreen needleleaf forests: 0 1st Qu.:0.3342 1st Qu.: 73.10
## Evergreen broadleaf forests : 0 Median :0.4582 Median : 98.38
## Deciduous needleleaf forests: 0 Mean :0.4502 Mean :100.18
## Deciduous broadleaf forests : 0 3rd Qu.:0.5767 3rd Qu.:124.92
## Mixed forests : 0 Max. :0.8972 Max. :245.65
## (Other) : 0
## ------------------------------------------------------------
## Landcover_NDVI_Rainfall_filter$Landcover: Permanent wetlands
## Month x y
## 2 : 20235 Min. :-7692613 Min. :4208964
## 4 : 20235 1st Qu.:-7492462 1st Qu.:4297920
## 5 : 20235 Median :-7423428 Median :4356761
## 6 : 20235 Mean :-7392529 Mean :4396291
## 7 : 20235 3rd Qu.:-7288604 3rd Qu.:4483246
## 8 : 20235 Max. :-7038415 Max. :4640309
## (Other):121397
## Landcover MeanNDVI Rainfall
## Permanent wetlands :242807 Min. :0.00035 Min. : 4.425
## Evergreen needleleaf forests: 0 1st Qu.:0.43520 1st Qu.: 81.455
## Evergreen broadleaf forests : 0 Median :0.60495 Median :107.420
## Deciduous needleleaf forests: 0 Mean :0.58597 Mean :111.931
## Deciduous broadleaf forests : 0 3rd Qu.:0.74050 3rd Qu.:143.565
## Mixed forests : 0 Max. :0.93100 Max. :245.911
## (Other) : 0
## ------------------------------------------------------------
## Landcover_NDVI_Rainfall_filter$Landcover: Croplands
## Month x y
## 6 :25 Min. :-7259879 Min. :4628262
## 7 :25 1st Qu.:-7254782 1st Qu.:4628262
## 8 :25 Median :-7249222 Median :4629189
## 9 :25 Mean :-7243718 Mean :4631056
## 5 :24 3rd Qu.:-7232543 3rd Qu.:4634285
## 10 :24 Max. :-7212621 Max. :4639845
## (Other):61
## Landcover MeanNDVI Rainfall
## Croplands :209 Min. :0.00085 Min. : 39.50
## Evergreen needleleaf forests: 0 1st Qu.:0.10150 1st Qu.: 60.03
## Evergreen broadleaf forests : 0 Median :0.19670 Median :111.70
## Deciduous needleleaf forests: 0 Mean :0.21819 Mean :110.61
## Deciduous broadleaf forests : 0 3rd Qu.:0.30920 3rd Qu.:142.27
## Mixed forests : 0 Max. :0.71765 Max. :192.67
## (Other) : 0
## ------------------------------------------------------------
## Landcover_NDVI_Rainfall_filter$Landcover: Urban and built-up lands
## Month x y
## 7 :1691 Min. :-7654621 Min. :4263172
## 9 :1686 1st Qu.:-7254319 1st Qu.:4631042
## 8 :1666 Median :-7245053 Median :4635212
## 6 :1500 Mean :-7267513 Mean :4601537
## 10 :1494 3rd Qu.:-7233006 3rd Qu.:4637992
## 5 : 953 Max. :-7146830 Max. :4640309
## (Other):1300
## Landcover MeanNDVI Rainfall
## Urban and built-up lands :10290 Min. :0.00000 Min. : 6.045
## Evergreen needleleaf forests: 0 1st Qu.:0.01900 1st Qu.: 87.405
## Evergreen broadleaf forests : 0 Median :0.05443 Median :115.095
## Deciduous needleleaf forests: 0 Mean :0.13592 Mean :117.031
## Deciduous broadleaf forests : 0 3rd Qu.:0.20299 3rd Qu.:164.237
## Mixed forests : 0 Max. :0.97610 Max. :231.176
## (Other) : 0
## ------------------------------------------------------------
## Landcover_NDVI_Rainfall_filter$Landcover: Cropland/natural vegetation mosaics
## NULL
## ------------------------------------------------------------
## Landcover_NDVI_Rainfall_filter$Landcover: Snow and ice
## NULL
## ------------------------------------------------------------
## Landcover_NDVI_Rainfall_filter$Landcover: Barren
## NULL
## ------------------------------------------------------------
## Landcover_NDVI_Rainfall_filter$Landcover: Water bodies
## NULL
sum(is.na(Landcover_NDVI_Rainfall_filter))
## [1] 0
Below is a computation of the basic statistics(minimum, maximum, mean, median) of all numeric variables found in NDVI dataset
library(skimr)
##
## Attaching package: 'skimr'
## The following object is masked from 'package:raster':
##
## bind
skim(NDVI_timeseries_Month)
| Name | NDVI_timeseries_Month |
| Number of rows | 230 |
| Number of columns | 5 |
| _______________________ | |
| Column type frequency: | |
| Date | 1 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: Date
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| YearMonth | 0 | 1 | 2013-01-01 | 2022-12-01 | 2017-12-16 | 120 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Year | 0 | 1 | 2017.50 | 2.88 | 2013 | 2015.00 | 2017.50 | 2020.00 | 2022.00 | ▇▇▇▇▇ |
| JulianDay | 0 | 1 | 177.00 | 106.36 | 1 | 81.00 | 177.00 | 273.00 | 353.00 | ▇▆▇▆▇ |
| NDVI | 0 | 1 | 0.46 | 0.22 | 0 | 0.32 | 0.42 | 0.59 | 0.87 | ▁▇▇▂▅ |
| Month | 0 | 1 | 6.39 | 3.49 | 1 | 3.00 | 6.00 | 10.00 | 12.00 | ▇▅▅▃▇ |
sum(is.na(NDVI_timeseries_Month))
## [1] 0
For the purpose of this project, there will be a conduct of data analysis and data visualization which includes Spatial mapping NDVI and land cover/land use in 2019. The following will be conducted on the dataset;
Map the average NDVI in 2019 or a specific date NDVI in 2019.
Describe the spatial distributions of NDVI and land cover/land use
# Read the shapefile data for IN boundary
IN<-st_read("Indiana.shp")
## Reading layer `Indiana' from data source
## `C:\Users\charl\OneDrive\Desktop\DSCI 607 Data Analytics for Environmental Sciences\Indiana.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 14 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -88.09789 ymin: 37.77173 xmax: -84.78459 ymax: 41.76137
## Geodetic CRS: NAD83
# Reading in the downloaded land cover and NDVI raster data at 2019
IGBP_raster <- raster("MCD12Q1_LC1_2019_001.tif")
NDVI_raster <- raster("MOD13A1_NDVI_2019_209.tif")
## Transforming data
# Transfer the project of the IN polygon with the same projection with the raster images
crs(IGBP_raster) # cordinate reference system
## Coordinate Reference System:
## Deprecated Proj.4 representation:
## +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## WKT2 2019 representation:
## PROJCRS["unknown",
## BASEGEOGCRS["unknown",
## DATUM["unknown",
## ELLIPSOID["unknown",6371007.181,0,
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8901]]],
## CONVERSION["unknown",
## METHOD["Sinusoidal"],
## PARAMETER["Longitude of natural origin",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["False easting",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]]
crs(IN)
## [1] "GEOGCRS[\"NAD83\",\n DATUM[\"North American Datum 1983\",\n ELLIPSOID[\"GRS 1980\",6378137,298.257222101,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"geodetic latitude (Lat)\",north,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"geodetic longitude (Lon)\",east,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n ID[\"EPSG\",4269]]"
IN_pro <- st_transform(IN,crs(IGBP_raster))
## Cropping raster data
IGBP_raster_IN <- mask(IGBP_raster,IN_pro) ##use crop() if necessary
NDVI_raster_IN <- mask(NDVI_raster,IN_pro)
## Visualize the raster images
plot(IN_pro) # This will create a simple plot of the spatial data
## Warning: plotting the first 9 out of 14 attributes; use max.plot = 14 to plot
## all
## plot of the used raster image files
plot(IGBP_raster_IN,main = "Raster File")
plot(NDVI_raster_IN,main = "Raster File")
#### data exploration
# Creating a spatial map using ggplot2
ggplot() +
geom_sf(data = IN, aes(fill = IN$REGION)) +
theme_minimal() +
labs(title = "Spatial Data Map", fill = "REGION")
## Warning: Use of `IN$REGION` is discouraged.
## ℹ Use `REGION` instead.
### below is the visualization of the region of the state of indiana
##Creating dataframe for the raster files for the purpose further visualization
rast_df1 <- as.data.frame(IGBP_raster, xy = TRUE)
rast_df2 <- as.data.frame(NDVI_raster, xy = TRUE)
### checking the unique month levels found in Landcover dataset
months <- unique(Landcover_NDVI_Rainfall_filter$Month)
months
## [1] 1 2 3 4 5 6 7 8 9 10 11 12
## Levels: 1 2 3 4 5 6 7 8 9 10 11 12
## visualization of landcover
Landcover_NDVI_Rainfall_filter %>% filter(Month==months[1]) %>%
ggplot() +
geom_point(aes(x = x, y = y, fill = MeanNDVI, col = MeanNDVI))
# Visualising using ggplot2
ggplot() +
geom_raster(data =Landcover_NDVI_Rainfall_filter,
aes(x = x, y = y, fill = Landcover)) +
geom_sf(data = IN_pro, inherit.aes = FALSE, fill = NA) +
scale_fill_viridis(name = "Land Cover Type", discrete = TRUE) +
labs(title = "Land Cover classification",
subtitle = "01-01-2019 - 31-12-2019",
x = "Longitude",
y = "Latitude") +
theme_minimal()
## Warning: Raster pixels are placed at uneven horizontal intervals and will be shifted
## ℹ Consider using `geom_tile()` instead.
Visualization based on Seasons- Winter, Spring, Summer, Fall
## visualization based on seasons- Winter, Spring, Summer, Fall
Landcover_NDVI_Rainfall_filter$season <- ifelse(Landcover_NDVI_Rainfall_filter$Month %in% c('1', '2', '3'), 'Winter',
ifelse(Landcover_NDVI_Rainfall_filter$Month %in% c('4', '5', '6'), 'Spring',
ifelse(Landcover_NDVI_Rainfall_filter$Month %in% c('7', '8', '9'), 'Summer', 'Fall')))
Landcover_NDVI_Rainfall_filter$season <- as.factor(Landcover_NDVI_Rainfall_filter$season)
Landcover_NDVI_Rainfall_filter %>% filter(season == 'Fall') %>% ggplot(aes(x = x, y = y, fill = Rainfall)) +
geom_tile()+ theme_minimal()+
scale_fill_viridis_c(option = 'viridis') + labs(title = 'Rainfall in Indiana', subtitle = 'Fall 2019') + xlab("") + ylab("")
Landcover_NDVI_Rainfall_filter %>% filter(season == 'Spring') %>% ggplot(aes(x = x, y = y, fill = Rainfall)) + geom_tile()+ theme_minimal()+
scale_fill_viridis_c(option = 'viridis') + labs(title = 'Rainfall in Indiana', subtitle = 'Spring 2019') + xlab("") + ylab("")
Landcover_NDVI_Rainfall_filter %>% filter(season == 'Winter') %>% ggplot(aes(x = x, y = y, fill = Rainfall)) + geom_tile()+ theme_minimal()+
scale_fill_viridis_c(option = 'viridis')+ labs(title = 'Rainfall in Indiana', subtitle = 'Winter 2019') + xlab("") + ylab("")
Landcover_NDVI_Rainfall_filter %>% filter(season == 'Summer') %>% ggplot(aes(x = x, y = y, fill = Rainfall)) + geom_tile()+ theme_minimal() +
scale_fill_viridis_c(option = 'viridis')+ labs(title = 'Rainfall in Indiana', subtitle = 'Summer 2019') + xlab("") + ylab("")
## changing month to Jan, Feb etc.
df <- NDVI_timeseries_Month
df$Month <- month.abb[df$Month]
###### MeanNDVI ###########
#visualization of the meanndvi variable
ggplot(Landcover_NDVI_Rainfall_filter, aes(x=MeanNDVI)) +
geom_histogram(position= "dodge", alpha=0.5, binwidth = 0.005, color= "blue",fill="green")+
labs(title= "MeanNDVI Distribution",x= "MeanNDVI values",y= "count")+
theme_classic()
## geom_histogram is the geometric function
## Labs allow us to set the title
## theme_ classic suggest the basic theme for the plotting area
###### NDVI ###########
## boxplot indicating the vegetation index which suggest an increase of the vegetation index from Jan to Jun and a decline from Aug to Dec.
ggplot(df, aes(x = Month, y = NDVI)) + geom_boxplot() + theme_minimal()+ scale_x_discrete(limits = month.abb)
##boxplot of meanNDVI against Landcover types
ggplot(Landcover_NDVI_Rainfall_filter, aes(x = Landcover, y = MeanNDVI)) + geom_boxplot() + theme_minimal()
## Anova Analysis
Below is ANOVA analysis of NDVI with month and land cover types using data in 2019.
## group of landcover types agianst the summary of meanNDVI
Landcover_NDVI_Rainfall_filter %>% group_by(Landcover) %>% summarise(mean(MeanNDVI))
## # A tibble: 13 × 2
## Landcover `mean(MeanNDVI)`
## <fct> <dbl>
## 1 Evergreen needleleaf forests 0.496
## 2 Evergreen broadleaf forests 0.531
## 3 Deciduous needleleaf forests 0.665
## 4 Deciduous broadleaf forests 0.670
## 5 Mixed forests 0.627
## 6 Closed shrublands 0.598
## 7 Open shrublands 0.514
## 8 Woody savannas 0.412
## 9 Savannas 0.469
## 10 Grasslands 0.450
## 11 Permanent wetlands 0.586
## 12 Croplands 0.218
## 13 Urban and built-up lands 0.136
## Two way Anova Test
two_way <- aov(MeanNDVI ~ Month + Landcover, data= Landcover_NDVI_Rainfall_filter)
summary(two_way)
## Df Sum Sq Mean Sq F value Pr(>F)
## Month 11 160083 14553 1549451 <2e-16 ***
## Landcover 12 28138 2345 249650 <2e-16 ***
## Residuals 4857089 45619 0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can now check normality
par(mfrow = c(1, 2)) # combine plots
# histogram
hist(two_way$residuals)
# QQ-plot
library(car)
## Warning: package 'car' was built under R version 4.3.3
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
qqPlot(two_way$residuals
)
## [1] 1116542 1089817
From the histogram and QQ-plot above, we can observe that the normality assumption seems to be met. As shown above, the histogram roughly form a bell curve, indicating that the residuals follow a normal distribution. Furthermore, points in the QQ-plots roughly follow the straight line and most of them are within the confidence bands, also indicating that residuals follow approximately a normal distribution.
model <- lm(MeanNDVI ~ Rainfall, data = Landcover_NDVI_Rainfall_filter)
model
##
## Call:
## lm(formula = MeanNDVI ~ Rainfall, data = Landcover_NDVI_Rainfall_filter)
##
## Coefficients:
## (Intercept) Rainfall
## 0.4611341 0.0004597
summary(model)
##
## Call:
## lm(formula = MeanNDVI ~ Rainfall, data = Landcover_NDVI_Rainfall_filter)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.56272 -0.17996 -0.04201 0.19264 0.47835
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.611e-01 2.623e-04 1758.0 <2e-16 ***
## Rainfall 4.597e-04 2.301e-06 199.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2185 on 4857111 degrees of freedom
## Multiple R-squared: 0.00815, Adjusted R-squared: 0.00815
## F-statistic: 3.991e+04 on 1 and 4857111 DF, p-value: < 2.2e-16
### model coefficients
coefficients(model)
## (Intercept) Rainfall
## 0.4611341166 0.0004596665
Slope and Intercept Plot
ggplot(Landcover_NDVI_Rainfall_filter, aes(Rainfall, MeanNDVI)) +
geom_point() +
geom_abline(intercept = model$coefficients[1],
slope = model$coefficients[2],
color = "red")
The output table above in the first place reports the formula that was used to generate the results(call), the summary of the model residuals and finally the coefficient and intercept of the results. The y-intercept of the linear regression equation is reported as 4.611e-01. The coefficients describes the estimated effect of Rainfall on NDVI. There is also a report on the residual standard error, R-square, f-statistics and p-value. The p-value reported for the model is 2.2e-16
Because the p value is so low (p-value: < 2.2e-16), we can reject the null hypothesis and conclude that Rainfall has a statistically significant effect on NDVI.
Time series analysis with NDVI time series from 2013-2022 over croplands
Plot and decompose time series of NDVI
Split the data into two parts: training data: 2013-2020; test data: 2021-2022
Model the time series analysis and forecasting with the training data and valid it with the test data
Interpret your time series analysis
library(zoo)
##
## Attaching package: 'zoo'
## The following object is masked from 'package:terra':
##
## time<-
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(padr)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:raster':
##
## select
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(knitr)
##
## Attaching package: 'knitr'
## The following object is masked from 'package:terra':
##
## spin
library(tseries)
### Unique dates
dates <- unique(NDVI_timeseries_Month$YearMonth)
dates
## [1] "2013-01-01" "2013-02-01" "2013-03-01" "2013-04-01" "2013-05-01"
## [6] "2013-06-01" "2013-07-01" "2013-08-01" "2013-09-01" "2013-10-01"
## [11] "2013-11-01" "2013-12-01" "2014-01-01" "2014-02-01" "2014-03-01"
## [16] "2014-04-01" "2014-05-01" "2014-06-01" "2014-07-01" "2014-08-01"
## [21] "2014-09-01" "2014-10-01" "2014-11-01" "2014-12-01" "2015-01-01"
## [26] "2015-02-01" "2015-03-01" "2015-04-01" "2015-05-01" "2015-06-01"
## [31] "2015-07-01" "2015-08-01" "2015-09-01" "2015-10-01" "2015-11-01"
## [36] "2015-12-01" "2016-01-01" "2016-02-01" "2016-03-01" "2016-04-01"
## [41] "2016-05-01" "2016-06-01" "2016-07-01" "2016-08-01" "2016-09-01"
## [46] "2016-10-01" "2016-11-01" "2016-12-01" "2017-01-01" "2017-02-01"
## [51] "2017-03-01" "2017-04-01" "2017-05-01" "2017-06-01" "2017-07-01"
## [56] "2017-08-01" "2017-09-01" "2017-10-01" "2017-11-01" "2017-12-01"
## [61] "2018-01-01" "2018-02-01" "2018-03-01" "2018-04-01" "2018-05-01"
## [66] "2018-06-01" "2018-07-01" "2018-08-01" "2018-09-01" "2018-10-01"
## [71] "2018-11-01" "2018-12-01" "2019-01-01" "2019-02-01" "2019-03-01"
## [76] "2019-04-01" "2019-05-01" "2019-06-01" "2019-07-01" "2019-08-01"
## [81] "2019-09-01" "2019-10-01" "2019-11-01" "2019-12-01" "2020-01-01"
## [86] "2020-02-01" "2020-03-01" "2020-04-01" "2020-05-01" "2020-06-01"
## [91] "2020-07-01" "2020-08-01" "2020-09-01" "2020-10-01" "2020-11-01"
## [96] "2020-12-01" "2021-01-01" "2021-02-01" "2021-03-01" "2021-04-01"
## [101] "2021-05-01" "2021-06-01" "2021-07-01" "2021-08-01" "2021-09-01"
## [106] "2021-10-01" "2021-11-01" "2021-12-01" "2022-01-01" "2022-02-01"
## [111] "2022-03-01" "2022-04-01" "2022-05-01" "2022-06-01" "2022-07-01"
## [116] "2022-08-01" "2022-09-01" "2022-10-01" "2022-11-01" "2022-12-01"
# 2. Check for missing values
missing_values <- sum(is.na(NDVI_timeseries_Month))
if (missing_values > 0) {
print(paste("Number of missing values:", missing_values))
} else {
print("No missing values found.")
}
## [1] "No missing values found."
invalid_ndvi <- sum(NDVI_timeseries_Month$NDVI < -1 | NDVI_timeseries_Month$NDVI > 1)
if (invalid_ndvi > 0) {
print(paste("Number of invalid NDVI values:", invalid_ndvi))
# You can take appropriate actions, such as removing or imputing invalid values
} else {
print("No invalid NDVI values found.")
}
## [1] "No invalid NDVI values found."
library(dplyr)
station1 <- NDVI_timeseries_Month %>%
group_by(YearMonth) %>%
summarise(NDVI = mean(NDVI))
## first 10 rows of the NDVI data
kable(head(station1,10),caption= "The first 10 rows of the NDVI data")
| YearMonth | NDVI |
|---|---|
| 2013-01-01 | 0.2311683 |
| 2013-02-01 | 0.3012603 |
| 2013-03-01 | 0.2946832 |
| 2013-04-01 | 0.3971307 |
| 2013-05-01 | 0.4623701 |
| 2013-06-01 | 0.6359540 |
| 2013-07-01 | 0.8445031 |
| 2013-08-01 | 0.8020901 |
| 2013-09-01 | 0.6559551 |
| 2013-10-01 | 0.4593558 |
# Convert YearMonth field from character to Date in date time format
station1$YearMonth<-as_datetime(station1$YearMonth)
###Choose three-year data
SiteNDVI_10yrs<-station1 %>%
select(YearMonth,NDVI) %>% #only select variable we want
filter(YearMonth>="2013-01-01" & YearMonth< "2022-12-31")
# Check the first of the time frame
head(SiteNDVI_10yrs,5)
## # A tibble: 5 × 2
## YearMonth NDVI
## <dttm> <dbl>
## 1 2013-02-01 00:00:00 0.301
## 2 2013-03-01 00:00:00 0.295
## 3 2013-04-01 00:00:00 0.397
## 4 2013-05-01 00:00:00 0.462
## 5 2013-06-01 00:00:00 0.636
# Check The End of the Time Frame
tail(SiteNDVI_10yrs,5)
## # A tibble: 5 × 2
## YearMonth NDVI
## <dttm> <dbl>
## 1 2022-08-01 00:00:00 0.847
## 2 2022-09-01 00:00:00 0.732
## 3 2022-10-01 00:00:00 0.458
## 4 2022-11-01 00:00:00 0.327
## 5 2022-12-01 00:00:00 0.246
# find invalid NDVI
InvalidNDVI <- SiteNDVI_10yrs%>% filter(NDVI<0)
print(InvalidNDVI)
## # A tibble: 0 × 2
## # ℹ 2 variables: YearMonth <dttm>, NDVI <dbl>
#### Original 10-year data
ggplot(station1,aes(x=YearMonth,y=NDVI))+geom_line()
#### Use ggplot() for time series data visualization to any two-year data in the 10-year time range of 2013-01-01 to 2022-12-31.After subsetting
ggplot(SiteNDVI_10yrs,aes(x=YearMonth,y=NDVI))+geom_line()
Split the data from 2013-01-01 to 2023-12-31(11-year data) into two data sets.
Training data: 2013-01-01 to 2020-12-31.
Test data: 2021-01-01 to 2022-12-31.
### 8 years
SiteNDVI_8yrs_train<-SiteNDVI_10yrs %>%
filter(YearMonth>="2013-01-01" & YearMonth < "2020-12-31")
### 3 years
SiteNDVI_3yrs_test<-SiteNDVI_10yrs %>%
filter(YearMonth>="2021-01-01" & YearMonth < "2022-12-31")
###Time series visualization using ggplot()
NDVI_plot <- SiteNDVI_8yrs_train %>%
ggplot(aes(x = YearMonth, y = NDVI)) +
geom_line(aes(color = "NDVI")) +
scale_x_datetime(name = "YearMonth", date_breaks = "2 year") +
scale_y_continuous(breaks = seq(0, 1, 0.2)) +
theme_minimal() +
labs(title = "Indiana NDIV over 2021", subtitle = "2020-1-1 - 2022-12-31")
ggplotly(NDVI_plot)
We create a seasonal time series (msts), decompose the msts, visualize the decomposed multi-time series and then save the msts to a data frame object.
SiteNDVI_8yrs_train_monthly <- SiteNDVI_8yrs_train %>%
mutate(month = month(as.Date(YearMonth)), year = year(as.Date(YearMonth))) %>%
group_by(year, month) %>%
summarise(mean_NDVI = mean(NDVI, na.rm = T), .groups = "keep") %>%
as.data.frame()
SiteNDVI_8yrs_train_monthly <- SiteNDVI_8yrs_train_monthly[1:96,]
SiteNDVI_3yrs_test_monthly <- SiteNDVI_3yrs_test %>%
mutate(month = month(as.Date(YearMonth)), year = year(as.Date(YearMonth))) %>%
group_by(year, month) %>%
summarise(mean_NDVI = mean(NDVI, na.rm = T), .groups = "keep") %>%
as.data.frame()
##Create the time series object using ts()
NDVI_ts_train<-ts(SiteNDVI_8yrs_train_monthly$mean_NDVI,start = c(2013,1), end = c(2022,12),frequency = 12)
plot(NDVI_ts_train)
##Create the time series object using msts()
NDVI_multi0 <-msts(SiteNDVI_8yrs_train_monthly$mean_NDVI,seasonal.periods = c(12)) #monthly
plot(NDVI_multi0)
library(zoo)
NDVI_ts_train_dec<-decompose(na.StructTS(NDVI_ts_train))%>%
plot()
The plot above shows the original time series (top), the estimated trend component (second from top), the estimated seasonal component (third from top), and the estimated random component (bottom). From the above it can be observed that the estimated trend shows a small decrease in 2018-19, which is followed by a steady increase from 2021 and rapid fall into 2022. There is a further decline in the trend as seen in the 2022 towards 2023.
NDVI_ts_train_stl1 <- stlm(NDVI_ts_train, s.window = 'periodic')
## Warning in ets(x, model = etsmodel, allow.multiplicative.trend =
## allow.multiplicative.trend, : Missing values encountered. Using longest
## contiguous portion of time series
# NDVI_ts_train_stl1 <- stlm(NDVI_ts_train, s.window = c(12))
# plot(NDVI_ts_train_stl1) # cannot use plot() here
#Method2: mstl(): Multiple seasonal decomposition: daily, weekly , monthly.
# Use x value
NDVI_multi0 <-msts(SiteNDVI_8yrs_train_monthly$mean_NDVI,seasonal.periods = c(12)) #monthly
NDVI_multi0%>%
mstl() %>%
autoplot()
# use time series (ts) object
NDVI_multi <-msts(NDVI_ts_train,seasonal.periods = c(12)) #monthly
NDVI_multi%>%
mstl() %>%
autoplot()
From the output we can clearly see the seasonal component and separated upward trend of the data.
#Method 1: moving average,Moving-average smoothing
par(mfrow=c(1,1))
ts.plot(NDVI_ts_train)
sm <- ma(NDVI_ts_train, order=12) # 12 month, moving average,Moving-average smoothing
lines(sm, col="blue") # plot
#Method 2: Simple exponential smoothing: Level Only
model <- hw(NDVI_ts_train, initial = "optimal", h=(12), beta=NULL, gamma=NULL) # h is the no. periods to forecast
## Warning in ets(x, "AAA", alpha = alpha, beta = beta, gamma = gamma, phi = phi,
## : Missing values encountered. Using longest contiguous portion of time series
#Method 3:Double Exponential smoothing: Level and Trend components
model <- hw(NDVI_ts_train, initial = "optimal", h=(12), gamma=NULL)
## Warning in ets(x, "AAA", alpha = alpha, beta = beta, gamma = gamma, phi = phi,
## : Missing values encountered. Using longest contiguous portion of time series
#Method 4: Holt Winters: Level, Trend and Seasonality
model <- hw(NDVI_ts_train, initial = "optimal", h=(12))#12 define the forecast Period Length
## Warning in ets(x, "AAA", alpha = alpha, beta = beta, gamma = gamma, phi = phi,
## : Missing values encountered. Using longest contiguous portion of time series
plot(model)
accuracy(model) # calculate accuracy measures
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0005086566 0.0466827 0.03406028 -5.261922 13.40567 0.7002557
## ACF1
## Training set 0.4210418
If both ACF and PACF plots show significant autocorrelation only at lag 0 (i.e., the first value) and decay quickly afterward, it suggests the time series is likely stationary. If the ACF plot shows a slow decay and/or the PACF plot has significant correlations at multiple lags, it might indicate non-stationary. If there is a sinusoidal pattern or gradual decay in the ACF and PACF plots, it might indicate seasonality, suggesting non-stationary.
acf(NDVI_ts_train,lag.max=60,xaxt="n", xlab="Lag (months)", na.action = na.pass)
pacf(NDVI_ts_train,lag.max=60,xaxt="n", xlab="Lag (months)", na.action = na.pass)
Step 6.1: Non-Seasonal ARIMA
########################################1. None seasonal ARIMA-remove seasonality and then focus on the data to removed seasonality
NDVI_ts_train0<-ts(NDVI_ts_train,frequency = 12)
AR0 <- Arima(NDVI_ts_train0, order = c(1,0,0)) #AR(1)
########################################2. Seasonal ARIMA-with seasonality
AR <- Arima(NDVI_ts_train, order = c(2,0,2),seasonal=c(1,1,1))
#AR <- Arima(NDVI_ts_train, order = c(1,0,3),seasonal=c(0,1,1))
#AR <- Arima(NDVI_ts_train, order = c(2,0,2),seasonal=c(0,0,0)) #non-season arima
#plotting the NDVI_ts_train series plus the forecast and 95% prediction intervals
ts.plot(NDVI_ts_train,xlim=c(2013,2024))
AR_forecast <- predict(AR, n.ahead = 24)$pred
AR_forecast_se <- predict(AR, n.ahead = 24)$se
points(AR_forecast, type = "l", col = 2,lwd = 3)
points(AR_forecast - 2*AR_forecast_se, type = "l", col = 2, lty = 2,lwd=2)
points(AR_forecast + 2*AR_forecast_se, type = "l", col= 2, lty = 2,lwd = 2)
## check for residuals
checkresiduals(AR)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,2)(1,1,1)[12]
## Q* = 19.434, df = 18, p-value = 0.3656
##
## Model df: 6. Total lags used: 24
## residuals bell shaped
########################################3. Auto ARIMA:produce the same result as the method2
AutoArimaModel=auto.arima(NDVI_ts_train)
AutoArimaModel #ARIMA(2,0,2)(1,1,1)[12]
## Series: NDVI_ts_train
## ARIMA(1,0,0)(2,1,0)[12]
##
## Coefficients:
## ar1 sar1 sar2
## 0.4843 -0.5576 -0.4961
## s.e. 0.0875 0.0974 0.0877
##
## sigma^2 = 0.002569: log likelihood = 164.17
## AIC=-320.34 AICc=-319.95 BIC=-309.61
############predict
ts.plot(NDVI_ts_train,xlim=c(2013,2024))
AR_prediction <- predict(AutoArimaModel, n.ahead = 24)$pred
AR_prediction_se <- predict(AutoArimaModel, n.ahead = 24)$se
points(AR_prediction, type = "l", col = 2,lwd = 3)
points(AR_prediction - 2*AR_prediction_se, type = "l", col = 2, lty = 2,lwd=2)
points(AR_prediction + 2*AR_prediction_se, type = "l", col= 2, lty = 2,lwd = 2)
############forecast
AR_forecast<-forecast(AutoArimaModel, h=12*2)
NDVI_ts_train%>%
autoplot() +
autolayer(AR_forecast,series = "Multi-Seasonal ARIMA",color = "purple")
###mlst object
NDVI_multi <-msts(NDVI_ts_train,seasonal.periods = c(12)) #monthly
NDVI_multi_ts <- NDVI_multi%>%
mstl()
autoplot(NDVI_multi_ts)
f_arima1<-forecast(NDVI_multi_ts, h=12*2) #ETS(M,N,N)
NDVI_ts_train%>%
autoplot() +
autolayer(f_arima1,series = "Multi-Seasonal Holt_Winter",color = "purple")
# stlm object: multi seasonal ARIMA model, using arima model,ARIMA(0,0,3)
NDVI_stlm<-stlm(NDVI_ts_train, method = "arima")
f_arima2<-forecast(NDVI_stlm, h=12*2) #ARIMA(0,0,3)
NDVI_ts_train%>%
autoplot() +
autolayer(f_arima2,series = "Multi-Seasonal ARIMA",color = "purple")
7. Compare models and validate models
Step 7.1: Compare AIC and BIC values and different errors
The measures calculated are:
ME: Mean Error
RMSE: Root Mean Squared Error
MAE: Mean Absolute Error
MPE: Mean Percentage Error
MAPE: Mean Absolute Percentage Error
MASE: Mean Absolute Scaled Error
ACF1: Autocorrelation of errors at lag 1.
By default, the MASE calculation is scaled using MAE of training set naive forecasts for non-seasonal time series, training set seasonal naive forecasts for seasonal time series and training set mean forecasts for non-time series data. If f is a numerical vector rather than a forecast object, the MASE will not be returned as the training data.
AIC_BIC <- rbind(c(f_arima1$model$aic,f_arima1$model$bic),c(f_arima2$model$aic,f_arima2$model$bic),c(AutoArimaModel$aic,AutoArimaModel$bic),c(AR$aic,AR$bic))
colnames(AIC_BIC) <- c("AIC","BIC")
rownames(AIC_BIC) <- c("mlst","stlm_arima","auto_arima","AR")
accuracy(f_arima1)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0008121236 0.04483284 0.03524469 -4.197212 12.69528 0.741506
## ACF1
## Training set 0.07517558
Accuracy<-rbind(accuracy(f_arima1),accuracy(f_arima2),accuracy(AutoArimaModel),accuracy(AR))
rownames(Accuracy) <- c("mlst","stlm_arima","auto_arima","AR")
kable(Accuracy,caption= "Errors (Accuracy) comparison of 4 models")
| ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
|---|---|---|---|---|---|---|---|
| mlst | -0.0008121 | 0.0448328 | 0.0352447 | -4.197212 | 12.69528 | 0.7415060 | 0.0751756 |
| stlm_arima | -0.0005982 | 0.0402946 | 0.0305436 | -5.088403 | 12.46081 | 0.6426015 | 0.0198871 |
| auto_arima | -0.0003873 | 0.0473815 | 0.0343624 | -4.876132 | 13.41552 | 0.7229436 | -0.0500050 |
| AR | 0.0005659 | 0.0406224 | 0.0293476 | -3.891634 | 11.59142 | 0.6174383 | -0.0923885 |
kable(AIC_BIC,caption= "AIC and BIC values comparision of four models")
| AIC | BIC | |
|---|---|---|
| mlst | -165.6161 | -157.2537 |
| stlm_arima | -420.1034 | -411.7660 |
| auto_arima | -320.3402 | -309.6489 |
| AR | -328.8271 | -310.1173 |
NDVI_prediction <- as.data.frame(f_arima1$mean)
predicted_NDVI <- as.numeric(NDVI_prediction$x)
observed_NDVI <- SiteNDVI_3yrs_test_monthly$mean_NDVI
max1<-max(observed_NDVI)
max2<-max(predicted_NDVI)
plot(observed_NDVI,predicted_NDVI[1:length(observed_NDVI)],xlim=c(0,max(max1,max2)),ylim=c(0,max(max1, max2)), xlab="Observed NDVI" , ylab="Predicted NDVI" , col="red", pch=5)
lines(c(0,max(max1, max2)),c(0,max(max1, max2)),col="black",lwd=3,lty=1)
legend(95,40,c('','1:1Line'),lty=c(NA,1),pch=c(NA,NA),lwd=c(NA,3),bg='white',ncol=1,box.lty=0)
lmMod <- lm(observed_NDVI ~ predicted_NDVI[1:length(observed_NDVI)])
summary (lmMod)
##
## Call:
## lm(formula = observed_NDVI ~ predicted_NDVI[1:length(observed_NDVI)])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.101633 -0.017430 0.007956 0.023209 0.084237
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.00193 0.02495 -0.077 0.939
## predicted_NDVI[1:length(observed_NDVI)] 1.02747 0.04821 21.312 1.05e-15
##
## (Intercept)
## predicted_NDVI[1:length(observed_NDVI)] ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04699 on 21 degrees of freedom
## Multiple R-squared: 0.9558, Adjusted R-squared: 0.9537
## F-statistic: 454.2 on 1 and 21 DF, p-value: 1.049e-15